skip to main content


Search for: All records

Creators/Authors contains: "Ghosh, Sudipto"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Grewe, Lynne L. ; Blasch, Erik P. ; Kadar, Ivan (Ed.)
    Sensor fusion combines data from a suite of sensors into an integrated solution that represents the target environment more accurately than that produced by individual sensors. New developments in Machine Learning (ML) algorithms are leading to increased accuracy, precision, and reliability in sensor fusion performance. However, these increases are accompanied by increases in system costs. Aircraft sensor systems have limited computing, storage, and bandwidth resources, which must balance monetary, computational, and throughput costs, sensor fusion performance, aircraft safety, data security, robustness, and modularity system objectives while meeting strict timing requirements. Performing trade studies of these system objectives should come before incorporating new ML models into the sensor fusion software. A scalable and automated solution is needed to quickly analyze the effects on the system’s objectives of providing additional resources to the new inference models. Given that model-based systems engineering (MBSE) is a focus of the majority of the aerospace industry for designing aircraft mission systems, it follows that leveraging these system models can provide scalability to the system analyses needed. This paper proposes adding empirically derived sensor fusion RNN performance and cost measurement data to machine-readable Model Cards. Furthermore, this paper proposes a scalable and automated sensor fusion system analysis process for ingesting SysML system model information and RNN Model Cards for system analyses. The value of this process is the integration of data analysis and system design that enables rapid enhancements of sensor system development. 
    more » « less
    Free, publicly-accessible full text available June 14, 2024
  2. Sensor fusion approaches combine data from a suite of sensors into an integrated solution that represents the target environment more accurately than that produced by an individual sensor. Deep learning (DL) based approaches can address challenges with sensor fusion more accurately than classical approaches. However, the accuracy of the selected approach can change when sensors are modified, upgraded or swapped out within the system of sensors. Historically, this can require an expensive manual refactor of the sensor fusion solution.This paper develops 12 DL-based sensor fusion approaches and proposes a systematic and iterative methodology for selecting an optimal DL approach and hyperparameter settings simultaneously. The Gradient Descent Multi-Algorithm Grid Search (GD-MAGS) methodology is an iterative grid search technique enhanced by gradient descent predictions and expanded to exchange performance measure information across concurrently running DL-based approaches. Additionally, at each iteration, the worst two performing DL approaches are pruned to reduce the resource usage as computational expense increases from hyperparameter tuning. We evaluate this methodology using an open source, time-series aircraft data set trained on the aircraft’s altitude using multi-modal sensors that measure variables such as velocities, accelerations, pressures, temperatures, and aircraft orientation and position. We demonstrate the selection of an optimal DL model and an increase of 88% in model accuracy compared to the other 11 DL approaches analyzed. Verification of the model selected shows that it outperforms pruned models on data from other aircraft with the same system of sensors. 
    more » « less
  3. Students in entry level CS courses come from diverse backgrounds and are learning study and time management skills. Our belief for their success is that they must master a growth mindset and that the final grade should represent their final mastery of topics in the course. Traditional grading systems tend to be too restrictive and hinder a growth mindset. They require strict deadlines that fail to easily account for student accommodations and learning differences. Furthermore, they run into averaging and scaling issues with 59% of a score counting as failing, making it difficult for students to redeem grades even if they later demonstrate mastery of topics. We designed a formative/summative grading system in our CS0 and CS1 classes for both on-campus and online students to support a structured growth mindset. Students can redo formative assignments and are provided flexible deadlines. They demonstrate their mastery in summative assignments. While being inspired by other grading systems, our system works seamlessly with auto-grading tools used in large, structured courses. Despite the flexibility, the courses provided a level of rigor before allowing students to continue onto the next course. Overall, 65% of students resubmitted assignments increasing their scores, participated in ungraded assignments, and used formative assignments for additional practice without a distinction between race or gender. These students went to the traditional follow-on CS2 course and 94% passed compared with 71% who took CS1 with a traditional grading system. 
    more » « less
  4. Scientists design models to understand phenomena, make predictions, and/or inform decision-making. This study targets models that encapsulate spatially evolving phenomena. Given a model, our objective is to identify the accuracy of the model across all geospatial extents. A scientist may expect these validations to occur at varying spatial resolutions (e.g., states, counties, towns, and census tracts). Assessing a model with all available ground-truth data is infeasible due to the data volumes involved. We propose a framework to assess the performance of models at scale over diverse spatial data collections. Our methodology ensures orchestration of validation workloads while reducing memory strain, alleviating contention, enabling concurrency, and ensuring high throughput. We introduce the notion of a validation budget that represents an upper-bound on the total number of observations that are used to assess the performance of models across spatial extents. The validation budget attempts to capture the distribution characteristics of observations and is informed by multiple sampling strategies. Our design allows us to decouple the validation from the underlying model-fitting libraries to interoperate with models constructed using different libraries and analytical engines; our advanced research prototype currently supports Scikit-learn, PyTorch, and TensorFlow. 
    more » « less
  5. Pirk, Holger ; Heinis, Thomas (Ed.)
    Organizations collect data from various sources, and these datasets may have characteristics that are unknown. Selecting the appropriate statistical and machine learning algorithm for data analytical purposes benefits from understanding these characteristics, such as if it contains temporal attributes or not. This paper presents a theoretical basis for automatically determining the presence of temporal data in a dataset given no prior knowledge about its attributes. We use a method to classify an attribute as temporal, non-temporal, or hidden temporal. A hidden (grouping) temporal attribute can only be treated as temporal if its values are categorized in groups. Our method uses a Ljung-Box test for autocorrelation as well as a set of metrics we proposed based on the classification statistics. Our approach detects all temporal and hidden temporal attributes in 15 datasets from various domains. 
    more » « less
  6. Scrubbing sensitive data before releasing memory is a widely accepted but often ignored programming practice for developing secure software. Consequently, confidential data such as cryptographic keys, passwords, and personal data, can remain in memory indefinitely, thereby increasing the risk of exposure to hackers who can retrieve the data using memory dumps or exploit vulnerabilities such as Heartbleed and Etherleak. We propose an approach for detecting a specific memory safety bug called Improper Clearing of Heap Memory Before Release, also known as Common Weakness Enumeration 244, in C programs. The CWE-244 bug in a program allows the leakage of confidential information when a variable is not wiped before heap memory is freed. Our approach combines taint analysis and model checking to detect this weakness. We have three main phases: (1) perform a coarse flow-insensitive inter-procedural static analysis on the program to construct a set of pointer variables that could point to sensitive data; (2) instrument the program with required dynamic variable tracking, and assertion logic for memory wiping before deallocation; and (3) invoke a model checker, the C-Bounded Model Checker (CBMC) in our case, to detect assertion violation in the instrumented program. We develop a tool, \toolname, implementing our instrumentation based algorithm, and we provide experimental validation on the Juliet Test Suite --- the tool is able to detect all the CWE-244 instances present in the test suite. To the best of our knowledge, this is the first work which presents a solution to the problem of detecting unscrubbed secure memory deallocation violations in programs. 
    more » « less
  7. Pirk, Holger ; Heinis, Thomas (Ed.)
    Organizations collect data from various sources, and these datasets may have characteristics that are unknown. Selecting the appropriate statistical and machine learning algorithm for data analytical purposes benefits from understanding these characteristics, such as if it contains temporal attributes or not. This paper presents a theoretical basis for automatically determining the presence of temporal data in a dataset given no prior knowledge about its attributes. We use a method to classify an attribute as temporal, non-temporal, or hidden temporal. A hidden (grouping) temporal attribute can only be treated as temporal if its values are categorized in groups. Our method uses a Ljung-Box test for autocorrelation as well as a set of metrics we proposed based on the classification statistics. Our approach detects all temporal and hidden temporal attributes in 15 datasets from various domains. 
    more » « less
  8. Intervention in the form of changing one's teaching style is beneficial for boosting student grades and retention. However, in spite of the availability of multiple intervention approaches, a key hindrance is reliance on the belief that students know how to study. We dedicated time and resources to not only teach the discipline of Computer Science, but also to teach students how to study using techniques grounded in psychology. We offered a one-credit "booster" course to students taking CS 2: Data Structures. Through direct advisor intervention based on the first exam grade, students were encouraged to take the booster course along with traditional interventions. We then tracked student growth across exams for the course as students were learning and being held accountable to study techniques not often emphasized in Computer Science. The students continued to increase their grades throughout the semester relative to the students who chose to not take the booster class. The students who were targeted for intervention but did not take the booster course continued to have lower grades throughout the semester, and only 41% of them passed the course. Students who participated in the booster course showed a 31% rate of growth across the semester, taking a failing grade to a passing grade, with 100% passing the course with a C or above. These results show a significant influence to help students succeed, which led to higher retention and increased grades. If we want students to truly succeed, we must teach them to study. 
    more » « less